In Regexes§
See primary documentation in context for Enumerated character classes and ranges
Sometimes the pre-existing wildcards and character classes are not enough. Fortunately, defining your own is fairly simple. Within <[ ]>
, you can put any number of single characters and ranges of characters (expressed with two dots between the end points), with or without whitespace.
"abacabadabacaba" ~~ / <[ a .. c 1 2 3 ]>* /; # Unicode hex codepoint range "ÀÁÂÃÄÅÆ" ~~ / <[ \x[00C0] .. \x[00C6] ]>* /; # Unicode named codepoint range "αβγ" ~~ /<[\c[GREEK SMALL LETTER ALPHA]..\c[GREEK SMALL LETTER GAMMA]]>*/; # Non-alphanumeric '$@%!' ~~ /<[ ! @ $ % ]>+/ # OUTPUT: «「$@%!」»
As the last line above illustrates, within <[ ]>
you do not need to quote or escape most non-alphanumeric characters the way you do in regex text outside of <[ ]>
. You do, however, need to escape the much smaller set of characters that have special meaning within <[ ]>
, such as \
, [
, and ]
.
To escape characters that would have some meaning inside the <[ ]>
, precede the character with a \
.
say "[ hey ]" ~~ /<-[ \] \[ \s ]>+/; # OUTPUT: «「hey」»
You do not have the option of quoting special characters inside a <[ ]>
– a '
just matches a literal '
.
Within the < >
you can use +
and -
to add or remove multiple range definitions and even mix in some of the Unicode categories above. You can also write the backslashed forms for character classes between the [ ]
.
/ <[\d] - [13579]> /; # starts with \d and removes odd ASCII digits, but not quite the same as / <[02468]> /; # because the first one also contains "weird" unicodey digits
You can include Unicode properties in the list as well:
/<:Zs + [\x9] - [\xA0] - [\x202F] >/ # Any character with "Zs" property, or a tab, but not a "no-break space" or "narrow no-break space"
To negate a character class, put a -
after the opening angle bracket:
say 'no quotes' ~~ / <-[ " ]> + /; # <-["]> matches any character except "
A common pattern for parsing quote-delimited strings involves negated character classes:
say '"in quotes"' ~~ / '"' <-[ " ]> * '"'/;
This regex first matches a quote, then any characters that aren't quotes, and then a quote again. The meaning of *
and +
in the examples above are explained in the next section on quantifiers.
Just as you can use the -
for both set difference and negation of a single value, you can also explicitly put a +
in front:
/ <+[123]> / # same as <[123]>